spark technology center
A developer's view on IBM's Open Tech AI strategy - IBM Code
IBM has a clear strategy towards Open Tech AI. I'm sure you remember back in 2011 when IBM Watson defeated the two world champions in Jeopardy. Apart from being backed by the awesome number of 2,880 IBM POWER7 cores providing 11,520 hyper-threads and 16 TB main memory, IBM Watson was running on Linux and using Hadoop. A lot of things have happened since then, and we are at a stage where IBM is one of the world leaders in AI Technology and Services. But what I'm especially proud of is IBM's clear strategy to create, support, and enhance Open Tech AI.
Spark Technology Center
Here's the deal, you've probably never heard of SystemML, but you definitely need to know what it is. Not only will SystemML make you look awesome because machine learning is the hot topic right now, but it will also save you a lot of time and trouble. As a new data scientist I am constantly having to spend my time learning new technologies--most of which don't work very well. Here's the thing: SystemML actually does work very well. Because it only recently became open source, it's difficult to find material on how to get started, but that's quickly changing.
Spark Technology Center
The Best Paper award for this year's International Conference on Very Large Data Bases (VLDB) goes to "Compressed Linear Algebra for Large-Scale Machine Learning", authored by a PhD candidate at the University of Maryland and four senior researchers from IBM. Their method for compressing matrices for linear algebra operations promises to provide users significant increases in speed with less memory. In particular, the compression technology provides benefits at two different parts of the data science process. Before training a model, a data scientist typically goes through multiple iterations of feature engineering. Common feature engineering tasks include examining the data with descriptive statistics and transforming the values in columns to better suit the assumptions built into different types of machine learning models.
Spark Technology Center
Now that the dust has settled on Apache Spark 2.0, the community has a chance to catch its collective breath and reflect a little on what was achieved for the largest and most complex release in the project's history. One of the main goals of the machine learning team here at the Spark Technology Center is to continue to evolve Apache Spark as the foundation for end-to-end, continuous, intelligent enterprise applications. With that in mind, we'll briefly mention some of the major new features in the 2.0 release in Spark's machine-learning library, MLlib, as well as a few important changes beneath the surface. Finally, we'll cast our minds forward to what may lie ahead for version 2.1 and beyond. For MLlib, there were a few major highlights in Spark 2.0: While these have already been well covered elsewhere, the STC team has worked hard to help make these initiatives a reality -- congratulations!
Spark Technology Center
This tutorial will get you set up and running SystemML on the Spark Shell like a star. But first, to refresh your memory, let me remind you that I am on a quest to create a life-changing app! I am new to the world of data science and am currently tackling the challenge of building an app using Apache SystemML and Apache Spark one step at a time. If you haven't already, make sure to check out my previous tutorials, which start here. So far we've daydreamed about delightful data, complained about how hard it is to find good data, found good data, learned how to write Scala and NOW we will learn how to access SystemML from the Spark Shell.
Spark Technology Center
The reputation of SystemML is on the rise. This flexible machine learning system scales automatically to Spark and Hadoop clusters and offers faster analysis on fewer nodes -- with substantial improvements in accuracy. In this presentation from Spark Summit in June 2016, researcher Fred Reiss spells out use cases for custom ML algorithms -- across the spectrum from auto manufacturing to the Watson Health initiative.
Go global with data science at Datapalooza
Over at Data-Mania, I've started a blog series on analytics as a service, specifically focusing on IBM's Watson Analytics technology. I originally planned this series because I wanted to quash some recurring worries I've heard among the data science community. Accordingly, I'll be demonstrating that IBM's Watson Analytics technology was unequivocally not designed to annihilate the careers of data science professionals. What's more, I'll be underscoring just how much IBM is doing to foster, cultivate and enrich the careers of working professionals in the big data and data science space. The Datapalooza event is one major avenue by which IBM is supporting the data science community, seeking to educate the "next generation of data scientists on how to apply their minds, creativity, and tools in the creation of innovative data products." That sounds pretty cool, right?